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background: Endometriosis is a heritable common gynaecological condition influenced by multiple genetic and environmental factors. 
Genome-wide association studies (GWASs) have proved successful in identifying common genetic variants of moderate effects for various 
complex diseases. To date, eight GWAS and replication studies from multiple populations have been published on endometriosis. In this 
review, we investigate the consistency and heterogeneity of the results across all the studies and their implications for an improved understanding 
of the aetiology of the condition. 
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methods: Meta-analyses were conducted on four GWASs and four replication studies including a total of I I 506 cases and 32 678 controls, 
and on the subsetof studies that investigated associations for revised American Fertility Society (rAFS) Stage II l/IV including 2859 cases. The data- 
sets included 9039 cases and 27 343 controls of European (Australia, Belgium, Italy, UK, USA) and 2467 cases and 5335 controls of Japanese 
ancestry. Fixed and Han and Elkin random-effects models, and heterogeneity statistics (Cochran's Q test), were used to investigate the evidence 
of the nine reported genome-wide significant loci across datasets and populations. 

results: Meta-analysis showed that seven out of nine loci had consistent directions of effect across studies and populations, and six out of nine 
remained genome-wide significant (P< 5 x 10"^), including rs 1 2700667 on 7pl5.2 (P= 1 .6 x 10"'), rs752l902 near WNT4 (P= 1.8 x 
10"'^), rs 1 085987 1 near \/EZ7 (P = 4.7 x 1 0" '^), rs 1 537377 near CDKN26-AS/ (P = l.5x 1 0"^), rs7739264 near ;D4 (P = 6.2 x 10"'°) 
and rs 1 33946 1 9 in GRERI (P = 4.5 x 10"^). In addition to the six loci, two showed borderline genome-wide significant associations with 
Stage lll/IV endometriosis, including rsl250248 in FNI (P = 8 x 10"®) and rs4l4l8l9 on 2pl4 (P = 9.2 x 10"®). Two independent inter- 
genic loci, rs4l4l8l9 and rs6734792 on chromosome 2, showed significant evidence of heterogeneity across datasets (P < 0.005). Eight of 
the nine loci had stronger effect sizes among Stage lll/IV cases, implying that they are likely to be implicated in the development of moderate 
to severe, or ovarian, disease. While three out of nine loci were inter-genic, the remaining were in or near genes with known functions of biological 
relevance to endometriosis, varying from roles in developmental pathways to cellular growth/carcinogenesis. 

conclusions: Our meta-analysis shows remarkable consistency in endometriosis GWAS results across studies, with little evidence of 
population-based heterogeneity. They also show that the phenotypic classifications used in GWAS to date have been limited. Stronger associa- 
tions with Stage lll/IV disease observed for most loci emphasize the importance for future studies to include detailed sub-phenotype information. 
Functional studies in relevant tissues are needed to understand the effect of the variants on downstream biological pathways. 

Key words: endometriosis / genetics / GWAS / sub-phenotypes / heterogeneity 



Introduction 

Endometriosis is a common, estrogen-dependent, inflammatory condi- 
tion associated with chronic pelvic pain, subfertility and dysmenorrhoea 
(Giudice and Kao, 2004; Berkley et o/., 2005). Its estimated prevalence 
rates range from 5- 10% in women of reproductive age in the general 
population to 35-50% among women with chronic pelvic pain and 
subfertility (Eskenazi and Warner, 1997). The causes of the condition 
are largely unknown, but are likely to be complex, involving multiple 
environmental and genetic factors. Based on a study of 3096 twins, the 
heritability of endometriosis, the proportion of disease variance due to 
genetic factors, has been estimated at around 52% (Treloar eto/., 1 999). 

To elucidate causal genetic variants underiying endometriosis, many 
investigators have used so-called 'candidate gene' study approaches 
over the past decades. Candidate gene study approaches are based on 
a hypothesis, which can be biological or positional. In biological candidate 
gene studies, genes with an inferred biological relevance to the disease 
are selected and genetic variants in these genes are tested for association 
with the disease of interest. In positional candidate gene studies, variants 
and genes are selected on the basis of prior evidence that a specific 
genomic region is implicated, for example through hypothesis-free 
linkage studies described below. Few such positional candidate gene 
studies have been performed in endometriosis (Treloar et o/., 2007; 
Zhao et o/., 2008; Lin et a/., 201 I). Biological candidate gene studies in 
endometriosis have been abundant and, similar to other complex dis- 
eases, it is fair to say that they have been generally unsuccessful, with 
limited replicated results (Montgomery et o/., 2008; Rahmioglu et o/., 
20 1 2). Reasons for the general failure of candidate gene studies to eluci- 
date genetic mechanisms in complex disease are clear: (i) they are based 
on a biological hypothesis that may not be true; (ii) only one or a few 
genes in a relevant biological pathway are typically tested; (iii) usually 
only a few variants in a gene are tested, and no attempt is made to com- 
prehensively coverthegene; (iv) cases and controls used are often poorly 



defined, or definitions vary and (v) sample sizes are usually inadequate to 
detect the effect sizes that are expected for variants influencing a 
complex trait (Zonden/an et o/., 2002). 

Rather than following hypotheses with little a priori probability of 
success, many Investigators have turned to hypothesis-free approaches 
to uncover genetic variants underlying disease on a genome-wide 
scale. There are two such hypothesis-free approaches: (i) family-based 
linkage studies and (ii) population-based genome-wide association 
studies (GWASs). Family-based linkage studies are aimed at identifying 
genomic regions harbouring genetic variants that are typically rare in 
the general population, and responsible for the aggregation of a 
disease in families with multiple affected individuals. 

Linkage studies have been very successful in identifying genetic 
variants responsible for rare, monogenic disorders, in which a mutation 
confers a very high risk of disease, but they have not generally been very 
successful in complex diseases, where genetic variants confer modest 
increases in susceptibility to disease. Exceptions are scenarios in 
which families are identified with strong familial aggregation of what is 
generally considered a complex disease. A well-known example of 
such a scenario is the identification of the BRCA genes habouring rare 
variants conferring high risk of familial breast and ovarian cancer 
(Easton, 1 999; Pal eto/., 2005; Campeau eto/., 2008). In 2005, the Inter- 
national Endogene Study published evidence of significant linkage of 
endometriosis to chromosome I0q26 from analysis of I 176 affected 
sister-pairfamilies (Treloar et al.. 2005); subsequentfine-mapping asso- 
ciation analyses of I0q26 suggested possible association of common 
variants near CyP2CI9 (Painter et al., 20 1 la, b), but rare variants that 
could explain the linkage signal remain to be identified. In 2007, the 
same study identified significant linkage to chromosome 7p 1 3- 1 5 in 
a sub-analysis of 248 families with more than 3 affected members 
(Zondervan et al., 2007), suggesting the presence of one or more rare 
variants conferring high risk; analyses are ongoing to identify the variants 
responsible (Lin et al., 20 1 I ). 
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Population-based G WASs are founded on the principle that common 
diseases, such as endometriosis, are caused by genetic variants that are 
common themselves (common disease-common variant hypothesis). 
Thus, they can be seen as a complementary approach to linkage 
studies, which are designed to find rare variants contributing to familial 
disease. The principle of GWAS is simple, in that a set of several 
100 000 common single nucleotide polymorphisms (SNPs or single 
DNA base pair changes), selected to provide the maximum coverage 
of the genome, is genotyped in a large set of cases and controls and 
their frequencies are compared between the two groups. This SNP se- 
lection is based on 'linkage disequilibrium' (LD) between SNPs, which 
is the non-random correlation of genetic variants in a population that 
exists due to shared ancestry of chromosomes. SNPs selected to 'tag' 
(represent) parts of the genome can be used to predict the allelic 
status of a genetic variant nearby because of shared ancestry of that par- 
ticular genomic segment. Thus, several 100 000 s oftagSNPs can be used 
to provide information on most of the ~I0 million common SNPs 
present in the human genome. In the past 5 years, GWAS have proved 
to be very successful in identifying many common genetic variants asso- 
ciated with complex disease, as demonstrated by the NHGRI Catalog 
of published GWAS (http://wvvw.genome.gov/gwastudies/; Fig. I) 



(Hindorff et al., 2009; Visscher et al., 2012). The current catalogue 
includes association results for I I 751 SNPsfrom 1 738 publications (No- 
vember 20 1 3). Notably, 88% of identified SNPs are either in inter-genic 
regions (43%) or located in intronic (non-coding) regions (45%) of genes, 
demonstrating that the interpretation of the signals typically requires 
further studies exploring the functionality of these regions (Hindorff 
eto/., 2009). Indeed, the ENCODE project has shown that ~80% of non- 
coding regions are likely to have functionality regulating gene expression 
(ENCODE Project Consortium, 20 1 2). 

GWAS of endometriosis 

In 2010, the first endometriosis GWAS was published on a Japanese 
dataset of 1907 cases and 5292 controls (Uno et al., 2010), providing 
genome-wide significant evidence for association of a variant in 
CDKN2R-ASI (cyclin-dependent kinase inhibitor 2B antisense RNA) 
[rs 1 0965235; odds ratio (OR) = 1.44 (95% confidence interval (CI): 
1.30-1.59), P= 5.57 X 10"'^]. This publication was quickly followed 
by that of a smaller Japanese GWAS of 696 cases and 825 controls 
that did not find a significant signal (Adachi et al., 2010). The first 
GWAS in women of European ancestry was published in 201 I by the 




Chromosome 1 23 4S 6 7 8 9 10 11 12 



Figure I Schematic overview of all genome-wide associations (P < 5 x 10^^) across all chromosomes (small window) and endometriosis associations 
(large window), presented in 1 7 trait categories (colour coded), generated from the NHGRI GWA Catalog (Hindorff et al., 2009). Available at: www 
.genome.gov/gwastudies (accessed I I October 20 1 3). 
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International Endogene Consortium (I EC), involving 3 194 surgically con- 
firmed cases and 7060 controls from Australian and UK datasets, with 
independent replication in a US dataset of 2392 cases and 227 1 controls 
(Painter et o/., 20 1 la, b). This study provided genome-wide significant 
evidence for an inter-genic locus on chromosome 7 [rs 1 2700667; 
OR= 1.22 (95% CI: 1. 1 3- 1. 32), P = 1 .4 x 10"'], and combined 
with the published results from Uno et al. (2010), for association with 
an SNP near WNT4 (wingless-type MMTV integration site family 
member 4) [rs752 1 902; OR = 1.19 (95% CI: 1 . 1 2- 1 .27), P = 4.2 x 
I0~^]. Both signals showed much stronger evidence for association 
with moderate-to-severe [revised American Fertility Society (rAFS) 
Stage lll/IV] endometriosis: rs 1 2700667, OR= 1.38 (95% CI: 1.24- 
1 .53); rs752 1 902: OR = 1 .25 (95% CI: 1 . 1 2- 1 .27). The lEC study could 
not replicate the signal for rs 1 0965235 seen in the Japanese GWAS by 
Uno et al. (20 1 0), as this variant was monomorphic (non-variable) in indivi- 
duals of European ancestry, nor could they find association with variants 
nearby. In 20 12, a meta-analysis of summary results, combining the evidence 
from the Japanese and lEC GWAS datasets, was conducted. This analysis 
confirmed the three loci published by the two original papers, and provided 
evidence for a further four (Nyholt et al., 2012): rs 1 085987 1 near VEZT 
[OR= 1.20 (95% CI: 1. 14-1.26); P=5.l x 10"'^]; rs4l4l8l9 in an 
inter-genic region on 2p 1 4 [OR =1.15 (95% CI: 1 .09- 1 .2 1 ), P = 8.5 x 
10"^]; rs7739264 near ID4 (inhibitor of DNA binding 4) [OR = 1.17 
(95% CI: I.II-I.23), P=3.6xl0"'°] and r^l537377 near 
CDKNIE^AS I [OR = 1. 15(95% CI: 1. 10- 1.21), P = 2.4 x 10"']. 

Since the Nyholt eto/. (20 1 2) meta-analysis, afourth GWAS in women 
of European ancestry was published, involving 20 1 9 cases and 1 4 47 1 
controls from the USA (Albertsen eto/., 20 1 3). This large study reported 
lack of significant replication of the I EC signal for rs 1 2700667 on chromo- 
some 7 (P = 0. 12), and found two genome-wide significant signals: (i) 
near WNT4 (rs2235529; P = 8.65 x 1 0"'), tagging the same locus for 
which association was reported by the I EC/Japanese meta-analyses 
(r'^ = 0.66 with rs752l902) and (ii) a novel intergenic signal 280Kb up- 
stream of RND3-R6/VI43 (rsl5l976l; P = 4.70 x 10"^). In addition, 
two replication studies were published, which genotyped some of the 
implicated SNPs in women of European ancestry. A study involving 
I 1 29 surgically confirmed cases and 83 I controls from Belgium (Sundq- 
vist eto/., 20 1 3) did not find significant evidence either for rs 1 2700667 on 
chromosome 7 (P = 0.46), or for rs752l902 in WNT4 (P = 0.I7). 
A second replication study in 305 surgically confirmed cases and 2710 
controls from Italy (Pagliardini et al., 2013) also could not replicate 
rs 1 2700667 (P=0.80) but did find significant evidence for WNT4 
(rs752l902, P = 5.6x 10"^), FN I (fibronectin I) (rs 1 250248, P = 
9.0 X 10"^) and an SNP in CDKNIB-ASI (rsl333049, P= 1.7 x 
I0~^). Results in these more recent papers have been interpreted as 
showing evidence for heterogeneity in the genetic loci underlying endo- 
metriosis in different populations, even when sampled from women with 
similar ethnic ancestry. 

Sources of heterogeneity 

Genetic population heterogeneity 

Association studies assume that the allele frequency differences of 
genetic variants (SNPs) observed between cases and controls reflect 
genetic factors underlying the disease. Common SNPs (population 
allele frequency > 0.05) genotyped throughout the genome for the asso- 
ciation studies are selected on the basis of LD. Different patterns of LD 



exist between populations due to differing mutational events and selec- 
tion pressures experienced by different populations throughout history. 
The International HapMap Project has determined LD patterns of SNPs 
across the genome through characterizing the genetic variants, and their 
frequencies and correlations, in DNA samples from I I different ethnic 
populations (The International HapMap Consortium, 2005, 2010). 
This has enabled the discovery of common genetic risk variants indirectly 
through testing tagSNPs that are highly predictive of the status of other 
SNPs, providing very dense coverage of the genome without genotyping 
all the variants in the region of interest, in each reference population. 

Due to population history differences, some of the disease risk var- 
iants implicated in one ethnic population may not be associated, or asso- 
ciated with a greater or lesser extent, with disease risk in another 
population. This problem is worse for more recent mutations, deleteri- 
ous mutations that are found at lower frequencies and which tend not to 
be shared between populations (making most of the rare mutations 
private to different ethnic populations), or those that affect traits with 
reduced reproductive fitness/fertility. Arguably, endometriosis could 
be an example of such a trait, although it is unknown whether endomet- 
riosis would have affected the fertility of women in eaHy reproductive 
years in previous centuries. 

Plienotypic heterogeneity: disease definition 

Variability in phenotypic characterization of endometriosis cases 
between studies is likely to contribute to the heterogeneity in findings, 
and may result in genetic risk variants going undetected due to dilution 
of the strength of association. Variability may arise because of differences 
in the proportions of endometriosis sub-phenotypes such as endome- 
trioma, recto-vaginal and peritoneal disease, or different frequencies of 
rAFS stage (American Society for Reproductive Medicine, 1997), which 
may have different genetic origins. In addition, some GWASs have included 
endometriosis cases that were diagnosed through methods otherthan the 
gold standard of laparoscopic surgery, such as ultra-sound imaging or clin- 
ical symptoms. Furthermore, women with endometriosis in the studies 
may vary according to pain or subfertility phenotypes which, if available 
at all, are likely to have been assessed through different, non-standardized, 
means. Also, the route to diagnosis can vary greatly between clinics and 
countries, due to referral patterns and cultural and socio-economic differ- 
ences, all of which could introduce heterogeneity in terms of the type of 
cases that are included in the studies. 

Given the suggested heterogeneity in genetic signals from endometri- 
osis GWAS and replication studies, and the potential for genetic and 
phenotypic variability to influence results, we investigated the heterogen- 
eity and consistency of results across all published GWAS and replication 
datasets from Australia, Belgium, Italy, Japan, the UK and the USA 
through meta-analyses of I I SNPs in 9 loci that reached genome-wide 
significance in at least I study. 

Methods 

Descriptions of GWAS and replication studies 
and populations 

For this meta-analysis, we identified all the GWAS of endometriosis and rep- 
lication studies published up to I December 2013. A systematic literature 
search in PubMed for English language publications was performed using 
the terms 'endometriosis' and 'GWAS' and/or 'replication'. Table I shows 
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Table I Summary of the eight published endometriosis GWAS and replication studies included in the meta-analysis. 



Cohort 


Ancestry 


No. of cases 


No. of stage lll/IV cases'* 


No. of controls 


References 


OX GWAS 


European (UK/USA/EU) 


924 


454 


5190 


Painter et al. (20 1 1 a) 


QIMRGWAS 


European (Australia) 


2270 


908 


1870 


Painter eto/. (201 la) 


Utah GWAS 


European (USA) 


2019 


848 


14 471 


Albertsen et al. (20 1 3) 


NHS II replication 


European (USA) 


2392 


No stage info 


2271 


Painter et al. (20 1 1 a) 


Pagliardini replication 


European (Italy) 


305 


220 


2710 


Pagliardini et al. (20 1 3) 


Sundqvist replication 


European (Belgium) 


1 129 


429 


831 


Sundqvist et al. (20 1 3) 


Total European ancestry 




9039 


2859 


27 343 




BBJ GWAS 


Japanese 


1423 


No stage info 


1318 


Uno etal. (2010) 


BBJ replication 


Japanese 


1044 


No stage info 


4017 


Uno etal. (2010) 


Total Japanese ancestry 




2467 




5335 




Total 




1 1 506 


2859 


32 678 





VAFS III and IV disease only. OX, Oxford University; QIMR, Queensland Institute of Medical Research; NHS II, Nurses' Health Study II; BBJ, BioBank Japan. 



all four GWAS and four replication datasets included in the meta-analysis, and 
their population origins. One GWAS dataset was of Japanese origin, and con- 
sisted of 1 423 case and 1318 controls obtained from the BioBankJapan (Uno 
et al., 2010). Individuals were genotyped on the lllumina 550 K BeadChip 
array. Cases were diagnosed through the presence of multiple clinical symp- 
toms, physical examinations and/or laparoscopic surgery; no information on 
disease stage or other sub-phenotypic disease data was available. The 
remaining three GWAS datasets are of European ancestry, with all cases lap- 
aroscopically confirmed: 2270 (40% rAFS Stage lll/IV) cases and I 870 con- 
trols from Australia ('QIMR'); 924 (49% Stage lll/IV) cases and 5 1 90 controls 
from the UK ('OX') (Painteret o/., 20 1 I a), and 20 1 9 (42% Stage lll/IV) cases 
and 1 4 47 1 controls from the USA ('UTAH') (Albertsen et al.. 20 1 3). The 
control sets in all GWAS datasets were population based and unscreened 
for endometriosis cases. The lEC Australian and UK datasets were geno- 
typed on lllumina 610/670 K and 670 K/ I M chips, respectively, while the 
US dataset was genotyped using lllumina OmniExpress 730 K BeadChip. 
The lEC datasets were imputed up to 1 000 Genomes Pilot reference panel 
(B36, June 2010). From the SNPs included in this meta-analysis, only 
rs 1 333049 was an imputed SNP in the QIMR Australian dataset. 

Ofthe four replication studies, one was conducted in women of Japanese, 
and three in women of European ancestry (Table I). The datasets comprised: 
(i) 2392 self-reported surgically diagnosed cases (no disease stage informa- 
tion) and 2271 unscreened population controls from the Nurse's Health 
Study II (NHS II) in the USA (Painter et al., 20 1 I a); (ii) 305 laparoscopically 
confirmed cases (72% Stage lll/IV cases) and 2710 population-based con- 
trols (90% unscreened for endometriosis) from Italy (Pagliardini et al., 
2013); (iii) I 129 laparoscopically confirmed cases (38% stage lll/IV cases) 
and 83 I laparoscopy-negative controls from a subfertility clinic population 
in Belgium (Sundqvist eto/., 20 1 3) and (iv) 1 044 cases (no disease stage infor- 
mation) and 40 1 7 unscreened population controls from BioBankJapan (Uno 
et ai, 20 1 0); 653 of these 1 044 cases had been used in a small GWAS of 696 
cases and 825 controls, which had not provided a genome-wide significant 
result (Adachi et al., 20 1 0). 

Meta-analysis of genome-wide significant 
results across the studies 

We performed a meta-analysis of association results for SNPs passing the 
P-value threshold < 5 x 10^^ ('genome-wide significance') in at least one 
ofthe studies, across all ofthe endometriosis GWAS and replication datasets, 
including a total of I I 506 cases and 32 678 controls, employing a fixed-effect 



model in the first instance, using GWAMA software (Magi and Morris, 20 1 0). 
ORs and 95% CIs and the sample size ofthe studies for each SNP were used 
as input to the model. SNPs that maintained a P-value < 5 x 10^^ on 
meta-analysis were considered genome-wide significant. Given the evidence 
for a substantially increased contribution of genetic factors to moderate/ 
severe (rAFS Stage lll/IV) compared with minimal/mild (rAFS Stage l/ll) 
endometriosis (Painter et al., 201 la, b), additional meta-analyses were 
conducted on 'Stage lll/IV enriched' and 'Stage lll/IV only' datasets. The 
'Stage Ill/IV-enriched' analysis included association results from QIMR, 
OX, UTAH, Pagliardini and Sundqvist Stage lll/IV cases (n = 2859) versus 
controls, combined with all-stage endometriosis versus controls association 
results from the BBJ, BBJ replication and NHS II replication datasets 
(for which stage information on cases was not available). Thus, the 'Stage 
Ill/IV-enriched' analysis was based on 7718 cases and 32 678 controls. 
The Stage Ill/IV-only analysis included association results from 2859 Stage 
lll/IV cases (QIMR, OX, UTAH, Pagliardini and Sundqvist) versus 32 678 
controls. 

Heterogeneity of allelic effects across studies was examined using the 
Cochran's Q test (Cochran, 1 954). Between-study heterogeneity was indi- 
cated by Q statistic P values < 0. 1 (loannidis et al., 2007), as well as the F 
index, which indicates the percentage of variance attributable to heterogen- 
eity (Higgins and Thompson, 2002). Given that the nine loci represented 
nine independent tests, a Bonferroni correction (0.05/9) was applied 
to the threshold for significant evidence of heterogeneity (P < 0.005). 
Meta-analysis of SNPs that showed evidence of effect heterogeneity was 
also carried out using the Han and Eskin random-effects model, which 
increases power to detect associations under heterogeneity, implemented 
in METASOFT software (Han and Eskin, 201 I). Beta effect sizes, standard 
errors and sample sizes for each SNP were used for input to the model. 

The variance explained by the established loci associated with endometri- 
osis was calculated by transformation of dichotomous disease risk onto 
a continuous liability scale that assumes a disease prevalence rate and a 
multiplicative model (Wray et ai, 20 1 0; Morris et al., 20 1 2). 

Investigation of other reported GWAS 
associations in endometriosis loci 

For each endometriosis SNP included in the meta-analysis, we determined 
the chromosomal band it is located in, using UCSC build 19. We then 
extracted all GWAS that reported genome-wide significant SNP associations 
with any disease or trait for the respective chromosomal segments from the 
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NHGRI resource (http://www.genome.gov/gwastudies/) (Hindorff et al., 
2009). We determined the distance between each endometriosis SNP and 
SNPs associated with other diseases/traits in each chromosomal segment, 
and assessed LD (correlation, r^) between each SNP pair in the Caucasian 
and Japanese populations using the 1 000 Genomes Pilot CEU and JPT data 
reference panels, implemented in Haploview (Barrett et al., 2005). An 
> 0.2 was used as the threshold for SNPs that were in moderate LD 
( 1 000 Genomes Project Consortium, 20 1 0). 

Results 

Consistency and heterogeneity of genetic 
associations between studies and populations 

Of the I I SNPs in 9 loci associated with endometriosis in at least one of 
the 8 studies, the meta-analysis showed genome-wide significant evi- 
dence (P < 5 X I0~®) for seven SNPs (six loci), with consistent direc- 
tions of effect across studies and populations (Table II, Fig. 2). These 
included rs752 1 902 near WNT4 [OR = 1.18 (95% CI: 1 . 1 3- 1 .23) P = 
1 .8 X 1 0~ '^]; rs 1 33946 1 9 in GREB I (growth regulation by estrogen in 
breast cancer I ) [OR = 1 . 1 3 (95% CI: 1 .07- 1 .20), P = 2.9 x 1 0"^]; 
rs7739264 near ID4 [OR = I.I I (95% CI: 1. 08- 1. 1 5), P=6.2x 
10"'°]; both rs 1 2700667 [OR= 1. 10 (95% CI: 1.06- 1.14), P = 
1.6 X 10"'] and rs779843 I [OR= 1.13 (95% CI: 1.09- 1.18), P = 
5.4 X 10"'] in an inter-genic region on chromosome 7 (correlation 
between SNPs = 0.8); rsl537377 near CDKN2E.-ASI [(OR = 1.12 
(95% CI: 1.08-1. 17), P= 1.0 x 10"®] and rsl085987l [OR= 1.18 
(95% CI: 1. 13-1.22), P=4.8xl0"'^] in VEZT. Interestingly, 
rs 1 250248 in FN I showed a strong association with Stage lll/IV 
disease (P = 8.0 x 10"®), had consistent directions of effect across all 
studies and populations, but just fell short of the genome-wide signifi- 
cance threshold. 

The results show that genetic heterogeneity between populations is not 
governing these six confirmed endometriosis loci. For three of the seven 
SNPs, the between-study heterogeneity Q test P-value was borderiine sig- 
nificant (0.0 1 < P < 0. 1 ) when considering that nine independent tests 
were conducted; however, the results from the random-effects model 
allowing for heterogeneity showed negligible difference from those 
obtained using fixed-effect meta-analysis, and they reached genome-wide 
significance under both models, demonstrating the lack of evidence for 
significant heterogeneity of signals between studies (Table II). One of 
the directionally consistent loci is rs 1 2700667, in an inter-genic region 
of high LD (48Kb > 0.8) with rs779843 1 on chromosome 7. The pre- 
viously reported lack of significant association of rs 1 2700667 and 
rs779843l variants in the individual Belgian (Sundqvist et al., 2013), 
Italian (Pagliardini et ai, 20 1 3) and Utah datasets (Albertsen et al., 20 1 3) 
is therefore likely to be due to stochastic fluctuations that typically 
can be seen between individual datasets. It also highlights an important 
point about the need to interpret study results as probabilities, and 
demonstrates the complex issue of what (lack of) 'replication' means 
when a fixed P-value threshold such as P < 0.05 is used. A variant with 
an association P > 0.05 does not imply the variant is not associated; 
rather, it means that, assuming the variant has no effect, if we conducted 
the same study 100 times, we would see the result in at least 5% of 
instances. 

Two inter-genic loci, both on chromosome 2, showed evidence of 
heterogeneity (P < 0.005) between studies: rs4 1 4 1 8 1 9 (closest gene: 



ETAAI) and rs6734792 (closest gene: RND3). Random-effects analyses 
substantially increased the significance of rs4l4l8l9, most notably for 
association with Stage lll/IV disease (enriched analysis: P=9.2x 
1 0"®), but had little effect on association with rs6734792 (all endomet- 
riosis: P = 2.2 X 10"^). 

An intriguing genomic region, because of the differences in allele 
frequencies and LD structure between individuals of Japanese and 
European ancestry, is the CDKN2B-ASI locus on chromosome 9 
(Table II, Fig. 3). In the first Japanese GWAS of endometriosis by Uno 
et al. (2010), the authors reported association with rs 1 0965235 in 
CDKN2Ei-AS I gene. This SNP, however, is monomorphic in individuals 
of European ancestry, and therefore direct replication of this signal was 
not possible in datasets of this ancestry. However, in the Nyholt et al. 
meta-analysis of the BBJ, QIMR and OX datasets, genome-wide signifi- 
cant association was reported across these datasets for rs 1 537377 
[P = 1 .0 X 1 0"®, OR = 1 .22 (95% CI: 1 . 1 4- 1 .30], 55 kb away from 
rs 1 0965235. LD (correlation) between rs 1 0965235 and rs 1 537377 in 
the Japanese HapMap reference population is very low (r^ = 0.0l) 
(Fig. 3). Nyholt etal. (2012) conducted conditional association analysis 
in the Japanese dataset, providing suggestive evidence for rs 1 537377 
to be a second, independent, risk variant in the CDKN2B-AS I region 
for endometriosis. Our present meta-analysis certainly suggests that 
rs 1 537377 is a risk variant across Japanese and European ancestry 
datasets, showing the strongest association with Stage lll/IV disease 
(Stage lll/IV enriched: P=5.8x 10"'^). In the Italian replication 
study, Pagliardini et al. genotyped rs 1 333049 in the CDKN2E,-ASI 
region, based on its location in the same Japanese LD block as 
rs 1 0965235 (Fig. 3) and on its common frequency in the Italian popula- 
tion [HapMap minor allele frequency in Tuscans in Italy (TSI) = 0.48]. 
In our meta-analysis including unpublished data from UK (OX GWAS) 
and Australian (QIMR GWAS) datasets, rs 1 333049 did not reach 
genome-wide significance (Stage lll/IV enriched: P = 0.05). 

Assuming a population prevalence for endometriosis of 8% 
(Zondervan et ai, 200 1 , 2002; Missmer et al., 2004), these nine loci 
together account for 1 .67% of variance in 'all' endometriosis suscep- 
tibility (Wray et ai, 20 1 0). Therefore, they currently have no immedi- 
ate role in risk prediction. 



Results for clinical sub-phenotypes 

Of the seven genome-wide significantly associated SNPs in our 
meta-analysis (six loci), six SNPs (five loci) showed an increasing effect 
size (OR) as the proportion of cases with Stage lll/IV disease included 
in analyses increased from all endometriosis to 'Stage Ill/IV-enriched' 
to 'Stage Ill/IV-only' (Table II). Figure 2 shows forest plots for the 
results of these loci from individual studies, for association with all 
endometriosis and Stage lll/IV disease, highlighting the point that for 
most loci, effect sizes were greater for association with Stage lll/IV 
disease. These results imply that the loci are likely to be implicated pre- 
dominantly in the development of moderate/severe disease. The most 
striking observation of an association that became close to genome-wide 
significance when limiting cases to the much smaller subset with known 
Stage lll/IV disease (n = 2859) is rs 1 250248 in FN / , which had an asso- 
ciation OR = I . I I (95% CI: 1 .04- 1 . 1 8, P = I.I x I O""*) with all endo- 
metriosis, and an ORof 1. 27 (95% CI: 1. 1 6- 1. 38), P= 8.0 X 10"®) with 
Stage lll/IV disease. 



Table II Results of the meta-analysis of the 1 1 published SNPs genome-wide significantly associated with endometriosis in at least one study. 



Chr SNP 



Position 
(HGI9) 



RA RAF in 
CEU 



I rs752l902'' 22490474 A 0.23 



2 rs 1 33946 1 9' 11727257 G 0.53 



2 rs4l4l8l9 67864425 C 0.27 



2 rs 1 250248 216286843 A 0.21 



2 rs6734792 151624632 C 0.38 



6 rs7739264 19785338 T 0.55 



7 rs 1 2700667 25901389 A 0.76 



7 rs779843r 25860562 G 0.79 



9 rs 1 537377 22169450 C 0.40 



9 rs 1 333049 22125253 G 0.54 



RAF in 
JAP 



0.33 



0.42 



0.22 



0.03 



0.32 



0.77 



0.20 



0.49 



0.39 



0.02 



Case 

selection'^ 



All 

lll/IV enriched 

Ill/IVonly 

All 

lll/IV enriched 
lll/IV only 
All 

lll/IV enriched 
lll/IV only 
All 

lll/IV enriched 
lll/IV only 
All 

lll/IV enriched 

Ill/IVonly 

All 

lll/IV enriched 
lll/IV only 
All 

lll/IV enriched 
lll/IV only 
All 

lll/IV enriched 

Ill/IVonly 

All 

lll/IV enriched 

Ill/IVonly 

All 

lll/IV enriched 
lll/IV only 



No. of 

studies 



Meta-analysis results 
pb 



pb 



Direction 



r** OR^ 



Nearest 
gene 

(distance) 



1.8 X 10"'^ 

2.7 X 10"'^ 

1.8 X 10"'° 

4.5 X I0"'*(2.9 X lO"**) 

3.5 X 10"** 

2.1 X 10"^ 

2.1 X 10"'' (8.8 X lO"*") 

2.5 X 10"^ (9.2 X lO"**) 

6.9 X 10""^ (1.0 X lO"*") 

1.1 X 10"'' 

1.3 X 10"^ 
8.0 X 10"** 

9.7 X 10"^ (2.2 X 10""^) 

5.5 X 10"^ 

6.5 X 10"^ (2.7 X 10"^) 

1.9 X I0""'(6.2 X 10"'°) 

6.7 X I0"'°(3.l X 10"'°) 

1.2 X 10"** 

1.9 X I0"'(l.6 X 10"') 

7.0 X 10"" (4.2 X 10"") 

4.5 X I0"'*(3.6 X lO"**) 

5.4 X 10"' 

8.5 X 10"'° 

9.7 X 10"** 

1.0 X lO"** 

5.8 X 10"'^ 

8.1 X 10"" (2.3 X 10"') 
0.25 (0.12) 

0.05 (0.03) 
0.55 (0.60) 




+ + + 



+ + + 




+ + + 



+ + + 




+ + + 



Phet 




(95% CI) 






0.83 


0 


1.18 


(1.13- 


.23) 


WNT4 


0.81 


0 


1.23 


(1.17- 


.28) 


(21 kb) 


0.80 


0 


1.25 


(1.16- 


.33) 




0.04* 


56.1 


1.13 


(1.07- 


.20) 


GRE6/ 


0.13 


39.8 


1.15 


(1.09- 


.20) 


(0) 


0.34 


25.6 


1.18 


(l.l 1- 


.24) 




0.004* 


70.6 


1.08 


(1.04- 


.12) 


Intergenic 


0.02* 


66.0 


1.15 


(1.09- 


.21) 


(ETAA/; 


0.003* 


81.9 


1.16 


(1.09- 


.24) 


227 kb) 


0.21 


23.0 


l.ll 


(1.04- 


.18) 


FNI 


0.86 


0 


1.13 


(1.07- 


.19) 


(0) 


0.86 


0 


1.26 


(1.16- 


.38) 




0.003* 


78.3 


I.IO 


(1.06- 


.16) 


Intergenic 


0.13 


49.8 


1.08 


(1.02- 


.15) 


(RND3: 


0.002* 


84.4 


I.IO 


(1.05- 


.15) 


281 kb) 


0.05* 


50.9 


l.ll 


(1.08- 


.15) 


ID4 


0.02* 


61.0 


1.14 


(I.IO- 


.19) 


(52 kb) 


0.56 


0 


1.20 


(1.13- 


.28) 




0.05* 


51.0 


1.13 


(1.08- 


.17) 


Intergenic 


0.02* 


57.9 


1.17 


(l.l 1- 


.22) 


(NFE2L3: 


0.02* 


69.5 


1.22 


(1.14- 


.31) 


290 Kb) 


0.14 


37.0 


1.13 


(1.09- 


.18) 


Intergenic 


0.15 


40.8 


1.19 


(1.13- 


.26) 


(NFE2L3: 


0.10 


58.3 


1.24 


(1.14- 


.33) 


331 Kb) 


0.30 


18.8 


1.12 


(1.08- 


.17) 


CDKNIE^ 


0.25 


23.8 


1.18 


(1.13- 


.23) 


ASI 


0.09* 


58.1 


1.18 


(l.l 1- 


.26) 


(48 Kb) 


0.01* 


77.3 


1.04 


(0.98- 


.10) 


CDKN2B- 


0.03* 


70.9 


1.08 


(1.00- 


.17) 


ASI 


0.47 


0 


1.03 


(0.94- 


.12) 


(4Kb) 



Genetic variants underlying endometriosis risk 



709 



sO 00 



O 



S5 



■&0 



•! E 



5« 



< I a 

2: r -§ 



§1° 

= B o 



E S 
J? z 



Q- c E 



> o S 



3 ° 



•■ Q 



o o 

^ rn 



:T- -' ^ i"^ 
= £ 

™ (- *H 



< ? 



OJ o 

i V 



Z 0) - 

So 5 ^ 

OJ £ ,^ . 
fl] 



OJ 

— P 

CO -tJ 
ON 3 



\0 



Q- O 



S 2 

1 ^ 

^ 'c 

g ^ 

» o 

2 u 
2 o 

-C C 



(u 15 - 



o 



~^ o ^ S 
01 Li_ » 



Current understanding of biological 
mechanisms of the genetic loci 

Figure 4 depicts the genes nearest to tlie nine loci (eleven SN Ps) analysed 
in our meta-analysis, showing the location of each SNP in relation to the 
gene and its functional structures. Also shown is the location of known 
SNPs within the gene reported to be genome wide associated with 
other traits and diseases (source: NIHGR GWAS database, http 
://www.genome.gov/gwastudies/; November 2013), that are in LD 
{r^ > 0.2) with any of the endometriosis SNPs. A comprehensive list 
of all published GWAS variants that are located in the genes but do 
not correlate (r^ < 0.2) with endometriosis SNPs, or are in LD with 
endometriosis variants outside genes, is provided in Supplementary 
data, Table SI. 

A total of four SNPs in three genetic loci, rs 1 2700667/rs779843 I on 
chromosome 7p 1 5.2, rs4 1 4 1 8 1 9 on 2p 1 4 and rs6734792 on 2q23.3 are 
in inter-genic regions with no known genes within 200 kb. Recent evi- 
dence from the ENCODE project has strongly suggested that these 
regions are likely to play an important role in gene transcription regula- 
tion (The ENCODE Consortium, 2012), and variants located in these 
regions may be involved in c/s- or trons- regulation of genes distantly 
located. Although we report the genes nearest to the associated variants 
(Fig. 4), this does not necessarily imply that it is the regulation of these 
genes that is affected by the variants. Further studies on the effect of 
the variants on gene expression (eQTL studies) are required to under- 
stand how they affect biological pathways. 

Intergenic rs 1 2700667 on 7p 1 5.2 is in high LD with rs 1 055 1 44 (P = 
1.0 X IO"^'*;r^ = 0.5, I OOOG pilot CEU data; Fig. 4) that was previously 
associated with waist-to-hip ratio adjusted for body mass index 
(WHRadjBMI), a measure of fat distribution, in an independent GWAS 
of ~ 1 90 000 individuals (Held et o/., 20 1 0). The 7p 1 5.2 region contains 
several potential candidate genes. NFE2L3 [Nuclear factor (erythroid- 
derived 2-like 3] is a transcription factor suggested to be involved in 
cell differentiation, inflammation and carcinogenesis (Chevillard and 
Blank, 201 I). NFE2L3 mRNA levels were found to be up-regulated in 
human breast cancer cells (Rhee et al., 2008) and testicular carcinoma 
tissue samples (Almstrup et al., 2007). Also, interferon-gamma has 
been shown to increase NFE2L3 mRNA levels in human uterine endo- 
thelial cells (Kitaya et al., 2007). However, its potential role in endomet- 
riosis aetiology remains to be discovered. A second locus of interest in 
this region is a microRNA, mlRNA_l 48a, with a purported role in the 
Wnt/|3-catenin signaling pathway (Qin etal., 20 1 0). Wnt/|3-catenin sig- 
naling has an important role in communication between epithelial and 
stromal cells in endometrium (Tulac et al., 2003), and may have a role 
in endometriosis-related infertility, and/or its development through 
sex hormone homeostasis regulation (Matsuzaki et al., 2010; Wang 
et al., 2010) and fibrogenesis (Matsuzaki and Darcha, 2013). In vitro 
studies have also shown that, through targeting Wnt/p-catenin 
pathway, cellular mechanisms known to be involved in endometriotic 
lesion development, such as cell proliferation, migration and invasion 
of endometrial and endometriotic epithelial and stromal cells, can be 
inhibited (Matsuzaki and Darcha, 2013). Moreover, the Wnt/ 
(J-catenin pathway is involved in development, tissue self-renewal and 
in various cancers and other disease such as type II diabetes and osteo- 
porosis (Clevers, 2006; Klaus and Birchmeier, 2008; Clevers and Nusse, 
20 1 2). MiRNA_l 48a has been shown to regulate adipogenesis through 
modulation of Wnt/fJ-catenin signaling pathway (Qin et al., 2010), 
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Figure 2 Forest plots showing the effects of risk alleles for SNPs in six loci reaching genome-wide significance for association with all endometriosis and 
two loci reaching borderline genome-wide significance with only Stage lll/IV cases in the meta-analysis. BBJ_Rep, BioBank Japan replication. 
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Figure 3 Linkage disequilibrium structure for the region containing rs I SJlJll, rs 1 333049 and rs 1 0965235 in/near CDKN2BAS on chromosome 9 in 
individuals of European ancestry (bottom panel) and of Japanese ancestry (top panel) {Source: http;//hapmap. ncbi.nim.nih.gov). 



emphasizing the potentially important role of this pathway for both endo- 
metriosis and fat distribution. More distant candidate genes for this 
7pl5.2 endometriosis association signal include HOXAIO (Homeobox 
AI0)andHOXA/ / (Homeobox A I I ) which are ~ 1 .35 Mb downstream. 
HOXAIO and HOXAI I are members of the homeobox A family of 



transcription factors that play a role in uterine development (Taylor 
et al., 1999; Wu et o/., 2005). It is possible that the 7pl5.2 signal influ- 
ences the regulation of interactive expression of a number of these 
genomic loci; functional studies targeting this region will need to be con- 
ducted to further elucidate its role in endometriosis pathogenesis. 
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Figure 4 Diagrams showing the 9 genes closest to each of the I I endometriosis SNPs included in the meta-analysis. Exons (coding regions) are presented 
with grey coloured boxes; lines between the exons present introns (non-coding genie regions); empty white boxes at the ends of the genes represent the 3' 
and 5' UTR regions. Each endometriosis SNP is illustrated in red, along with its distance to the gene where relevant (red arrows). SNPs genome wide asso- 
ciated with other trait/disease associations, that are in linkage disequilibrium (r^ > 0.2) with any of the endometriosis SNPs are illustrated in blue. See Sup- 
plementary data. Table SI for a complete list of all published SNP associations for these genomic regions including independent signals. 



Two further inter-genic loci did not reach genome-wide significance of 
association with endometriosis on meta-analysis, even when allowing 
for heterogeneic effects across studies. Rs6734792 on 2q23.3, P = 
2.2 X I0~^ with all-stage endometriosis, is located 280 kb upstream 
of RND3 (Rho Family GTPase 3). The signal showed similar strength of 
association for all-stage and Stage II I /IV endometriosis, implying a poten- 
tial role in the aetiology of both minimal/mild and moderate/severe 
disease. RND3 encodes a member of a subgroup of Rho family of small 
GTP-binding proteins. Rho GTPases are key regulators of the actin cyto- 
skeleton and stress fibre formation. In addition, RND3 is also involved in 
the regulation of cell-cycle progression, cell transformation (Chardin, 
2006) and cell migration (Guasch et o/., 1998). The third inter-genic 
signal, rs4 1 4 1 8 1 9 on 2p 1 4 (borderline significant with P = 9.2 x 1 0~® 
in Stage Ill/IV-enriched analysis) is located 227 kb away from ETAAI 
(Ewing's tumor-associated antigen I) that encodes a tumour-specific 
cell surface antigen in Ewing family of tumours (EFTs). EFTs is a group 
of cancers that form in bone or soft tissue that share common features 
as they develop from the same type of stem cell in the body (Borowski 
et al., 2006). Rs4l4l8l9 is located in an intronic region of a long non- 
coding RNA (IncRNA) AC007422. 1 , which is I 1 8 kb long. Non-coding 
RNAs are functional RNA molecules that are not translated into proteins 
but have a regulatory role in gene expression. The biological role of this 
particular IncRNA is not yet known. 



The remaining seven variants in six loci are located in or near 
(within 50 kb) genes (Fig. 4): WNT4. GREBI, FN I, ID4, CDKN2E.-ASI 
and VEZT. These variants are likely to be involved in the 'c/s'-regulation 
of expression of neighbouring genes and/or transcripts. The current 
knowledge on the biological function of the implicated genes, which 
vary from WNT4 and FN I in developmental pathways to functions of 
VEZT, GREB I , ID4, NFE2L3. FN I , ETM I and CDKN2E,-AS I in carcinogen- 
esis, is described below. 

WNT4 (Wingless-type MMTV integration site family member 4) 
Rs752 1 902 is located 2 1 kb up/downstream of WNT4 (Fig. 4). WNT4 is a 
protein-coding gene that is vital for development of the female reproduct- 
ive organs. In knockout mice, the loss of WNT4 leads to complete absence 
of the Mullerian duct and its derivatives (Vainio et al., 1 999). A previous 
study investigated the expression of genes playing decisive roles during 
the female reproductive tract development including WNT4 in peritoneal 
tissue from endometriosis cases and controls. They showed that WNT 
genes are expressed in normal peritoneum in addition to endometrium, 
suggesting that endometriosis can arise through metaplasia and can in the 
process make use of the developmental steps involved in the embryonic 
development of the female reproductive tract (Gaetje et al., 2007). As 
mentioned before, Wnt signaling is important for epithelial-stromal cell 
communication in the endometrium, and is likely to be important for 
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endometrial development, differentiation and embryonic implantation 
(Tulac et o/., 2003). Furthermore, the endometriosis variant rs752l902 
has also been genome-wide significantly associated with bone mineral 
density (Estrada et al., 20 1 2), highlighting this variant as a potential pleio- 
tropic locus (Fig. 4). 

GR£6 / (Growth regulation by estrogen in breast cancer I ) 
Rs 1 33946 1 9 is located in an intronic region between exon 9 and exon 1 0 
in GREB I . GREB I encodes for an eariy response gene in the estrogen 
regulation pathway that is involved in hormone-dependent breast 
cancer cell growth (Rae et al., 2005). Furthermore, Pellegrini et al. 
(2012) showed increased expression of GREB I in peritoneal eutopic 
endometriotic lesions compared with eutopic endometrium, implicating 
its transcription in estrogen-dependent growth in endometriosis. The 
underlying biological mechanism by which GREB I plays a role in 
hormone-responsive tissues especially estrogen-stimulated cell prolifer- 
ation in endometriosis remains to be elucidated. The GREB I region har- 
bours many other SNPs reported to be genome-wide significantly 
associated with other traits and conditions, with GREB I in particular 
associated with obesity-related traits (Supplementary data. Table SI). 
However, none of these SNPs are in LD with endometriosis SNP 
rs 1 33946 1 9. 

FN I (Fibronectin I) 

Rs 1 250248 is located in an intronic region between exon 1 0 and exon I I 
in FN I (Fig. 4). FN I is involved in cell adhesion and migration processes 
including embryogenesis, wound healing, blood coagulation, host 
defense and metastasis (Pankov and Yamada, 2002). Recently, it has 
been shown that S0X2, a gene encoding a transcription factor that 
targets FN/, is a key gene regulating cell migration in ovarian cancer 
(Lou etal.,20 1 3). Furthermore, an /nWtro study has shown that FN / mod- 
ulates CpG motif-dependent cytokine production in macrophages, 
supressing the immune resposiveness through TLR9 pathway (Yoshida 
et al., 2012). The FN I region has been associated with many other 
traits in GWASs (Supplementary data. Table SI); however, none of 
these SNPs are in LD with endometriosis SNP rs 1 250248. 

ID4 (Inhibitor ofDNA binding 4) 

Rs7739264 is located in an intronic region of IncRNA RPI - I67FI .2 
(794 bp), for which the biological function remains to be discovered. It 
is located 52 kb downstream of ID4 (Fig. 4). ID4 is an ovarian oncogene 
that is over-expressed in most primary ovarian cancers but not in normal 
ovary, fallopian tube and other tissues. Furthermore, it has been impli- 
cated, through methylation-related regulatory pathways, in breast car- 
cinogenesis (Verschuur-Maes et al., 2012) and is overexpressed in 
most ovarian, endometrial and breast cancer cell lines (Ren et al., 
2012). Potential mechanisms by which ID4 induces transformation 
include, through regulation of I-I0XA9 and CDKNIA (cyclin-dependent 
kinase inhibitor I A), transcriptional programmes to disrupt the normal 
regulation of cell proliferation and differentiation (Ren et al., 2012). 
/-/OXA genes have been shown to play essential roles in specifying region- 
al differentiation of the Mullerian duct into oviduct, uterus, cervix and 
vagina (Kobayashi and Behringer, 2003). The region around endometri- 
osis SNP rs7739264 contains a large number of reported SNPs asso- 
ciated with different traits and conditions in GWASs (Supplementary 
data, Table SI), none of which are in LD with rs7739264. 



CDKN2B-AS I (Cyclin-dependent kinase inhibitor 26 antisense RNA) 
Rs 1 537377 is located 48 kb upstream of CDKN2B-AS I , while rs 1 333049 
is located in the 3' UTR region ofthe gene and rs 1 0965235, that is mono- 
morphic is in the European populations, is in the intron between exons 1 6 
and 1 7 of the gene (Fig. 4). The CDKN2B-AS I locus encodes for cyclin- 
dependent kinase inhibitor 2B antisense RNA. In the same LD block 
with this gene are CDKN2A (PIS), CDKN2B (PIS) and ARF(PI4), which 
are all recognized tumour suppressor genes. CDKN2B-AS I has been 
shown to be involved in the regulation of CDKN2B, CDKN2A and ARE 
expression (Pasmant et al., 2007; Jarinova et al., 2009; Liu et al., 2009). 
Inactivation of CDKN2A, through loss of heterozygosity or hypermethy- 
lation of its promoter, has been reported in endometriosis and endomet- 
rial cancer (Goumenou et al., 2000; Martini et al., 2002; Guida et al., 
2009). SNPs in or near the CDKN2B-ASI locus have been associated 
with many other traits and disease (Supplementary data. Table SI), 
with a number in LD (r'^ > 0.2) with the endometriosis SNPs (Fig. 4), 
including rs 1 063 1 92 and rs2l577l9 with glaucoma (Osman et al., 
2012; Wiggs et al., 2012), rs4977756 with glioma (Rajaraman et al., 
2012), rs 1 0757269 with ankle-brachial index (Murabito et al., 2012), 
rs6475606 and rs 1 0757272 with intracranial aneurysm (Foroud et al., 
2012; Low et al., 2012), rs 1 537370 with coronary artery calcification 
(van Setten et al., 2013) and rs78656l8, rs 1 0757274, rs 1 333042 and 
rs944797 with coronary heart disease (Wild et al., 201 I; Lu et al., 
20 1 2; Takeuchi et al., 20 1 2; Lee et al., 20 1 3); because of these diverse 
associations, its function is an area of research for many investigators. 

VEZT (Vezatin) 

Rs 1 085987 1 is located 1 7 kb upstream of VEZT (Fig. 4). The locus was 
the second signal that showed similar strength of association for 
Stage lll/IV disease [OR= 1.19 (95% CI: I.I 1-1.27), P=6.8x 
1 0"^] versus all endometriosis [OR = 1.18 (95% CI: 1 . 1 3- 1 .22, P = 
4.8 X I0~'^]. VEZT encodes an adherens junction transmembrane 
protein. Vezatin expression has been shown to be down-regulated in 
gastric cancer patients through methylation of its promoter (Guo et al., 
20 1 I ; Miaoeto/., 20 1 3). It is a putative tumour suppressor gene, targeting 
cell migration and invasion genes, growth genes, cellular adhesion genes 
and a functionally validated cell cycle progression gene called TCFI9 
(transcription factor 1 9) (Miao eto/., 20 1 3). TCF/ 9 was found to be asso- 
ciated with lymphocyte count, mean cell hemoglobin, white blood cell 
count, hematocrit count and eosinophil count (Ferreira et al., 2009) 
hinting at a potential roleof TCF / 9 regulation through VEZTin maintaining 
immunological balance. In the Japanese population, Rs 1 085987 1 is in 
LD with rs 1 23 1 0399 (r^ = 0.57; Fig. 4, Supplementary data. Table SI) 
located 1 2 1 kb downstream of VEZT, a variant that has been associated 
with adverse response to chemotherapy (Low et al., 20 1 3) (Fig. 4). 

Conclusions and future directions 

Our meta-analysis demonstrated directionally consistent, genome-wide as- 
sociation of SNPs in six genetic loci with endometriosis across European an- 
cestry populations in Australia, Belgium, Italy, the UK, the USA, as well as 
Japanese ancestry populations: rs 1 2700667 on 7pl5.2, rs752l902 near 
WNT4, rs 1 085987 1 near VEZT, rs 1 537377 near CDKN2E^AS I , r^7739264 
near ID4 and rs 1 33946 1 9 in GREB I . Five of the six loci showed stronger 
effect sizes of association with Stage lll/IV disease, with the exception of 
VEZT. Of the remaining three loci, two (FN I and inter-genic 2pl4) were 
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borderline genome wide significant for association in Stage Ill/IV-only or 
Stage Ill/IV-enriched analyses, respectively. Except for inter-genic 
rs4l4l8l9 and rs6734792 on chromosome 2, none of the other SNPs/ 
loci showed significant evidence of heterogeneity across datasets (P > 
0.005) showing that the significantly associated risk variants of endometriosis 
are pertinent to all studied populations. 

It was notable that of the I I SNPs included in the meta-analysis, 5 
wereinLD(r'^ > 0.2) with an SNP that has also robustly been associated 
with different diseases and traits in other GWASs: in or near WNT4 
with bone mineral density; in CDKN2B-ASI with glaucoma, glioma, 
ankle-brachial index, intracranial aneurysm, coronary artery calcification 
and coronary heart disease; in VEZT with adverse response to chemo- 
therapy and in 7p 1 5.2 with fat distribution. This information is important, 
as the investigation of other established disease variants in genomic 
regions associated with endometriosis can aid in revealing the potentially 
biological mechanisms through which the variant may act upon endomet- 
riosis pathogenesis, and can lead to new investigations of joint aetiology 
and co-morbidity. 

Detailed and standardized disease phenotype 

Our results showing stronger associations with Stage lll/IV endometri- 
osis emphasize the importance of detailed sub-phenotype collection 
to allow analyses for the identification of further variants associated 
with sub-types of endometriosis. Currently, datasets with detailed surgi- 
cal and clinical sub-phenotype information are not available on a large 
scale, but such information collected in a standardized manner is 
required to enable future genomic research. The global WERF Endomet- 
riosis Biobanking and Phenome Harmonisation Project (EPHect), cur- 
rently involving 32 clinical and basic endometriosis research centres 
and 3 industrial collaborators, is developing freely available data collec- 
tion tools to enable standardized data collection, and thus foster future 
collaborative analyses across endometriosis research centres (http 
:// www.endometriosisfoundation.org/ephect). 

Larger sample sizes 

Although this may be surprising, the sample sizes of the endometriosis 
GWASs to date are at the lower end of GWASs in other complex 
disease fields, and an increased sample size for genome-wide 
meta-analysis of GWAS studies is predicted to increase the number of 
genome-wide significant loci (Visscher et al., 2012). For instance, 
recent meta-GWAS analyses of type 2 diabetes have included close to 
1 00 000 cases, and have identified around 65 genome-wide significant 
variants associated with disease (Morris et al., 2012); the count of 
genome-wide significant loci associated with inflammatory bowel 
disease, involving up to 37 000 cases and controls, is as high as 1 63 
(jostins eto/., 20 1 2). This further emphasizes the need for collaborations 
between centres, which collect standardized characteristics of the 
disease as well as detailed symptoms, to increase endometriosis case 
numbers and allow further GWAS datasets to be generated. Undoubt- 
edly, larger GWASs and meta-analyses in different populations will 
allow the detection of additional common causal variants of modest 
effects on endometriosis risk. 

Better coverage of genomic variation 

GWASs are limited with regard to their ability to detect only the effect of 
common variants (MAF > 0.05). It is likely that some of the unexplained 



genetic variation may be due to rarer variants (MAF < 0.05), either single 
sited or structural, that are not captured by current GWA genotyping 
arrays (Visscher et al., 2012). As mentioned before, family-based 
linkage studies of endometriosis have successfully identified two 
linkage regions that are likely to harbour rare causal variants, on chromo- 
some 1 0q26 (Treloar et al., 2005) and on chromosome 7p 1 3- 1 5 (Zon- 
dervan et o/., 2007). Systematic resequencing studies of these regions are 
required to identify the rare variants involved. 

Functional studies 

To understand the roles of the identified genetic variants in endometri- 
osis, functional studies are needed tissues relevant to endometriosis, 
such as eutopic and ectopic endometrium. Functional studies aim to 
understand how changes in the DNA level of variation translate to the 
(regulation of) RNA transcript levels, protein levels and metabolite 
levels. These studies are crucial in revealing the biological mechanisms 
by which the genetic variations detected are causally related to the 
disease end-points. To allow replication of findings between studies, 
and collaborative analyses, centres collecting tissues for the purpose of 
endometriosis research need to use similar, standard-operating proto- 
cols (SOPs) for collection, processing and storage of samples. In addition 
to guidelines and standards for data collection, the EPHect also provides 
freely available consensus SOPs for biological sample collection that will 
allow large-scale collaborative functional studies contributing to bio- 
marker and drug target discovery research. 

Supplementary data 

Supplementary data are available at http://humupd.oxfordjournals.org/. 
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